📊 Bioinformatics Data Visualization¶
In this notebook, we'll demonstrate how to visualize different types of bioinformatics data:
- Sequence-level properties like length and GC content
- Simulated gene expression with a volcano plot
- 3D molecular structures using
py3Dmol
We'll use real bacterial data and a known protein structure from the Protein Data Bank (PDB).
1️⃣ Sequence Feature Visualization¶
📥 Download E. coli CDS Sequences¶
We’ll fetch coding sequences from NCBI for visualization.
import urllib.request
url = "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_cds_from_genomic.fna.gz"
urllib.request.urlretrieve(url, "ecoli_cds.fna.gz")
print("✅ Downloaded E. coli CDS FASTA")
✅ Downloaded E. coli CDS FASTA
📊 Calculate Sequence Lengths and GC Content¶
Let’s parse the FASTA file and compute basic stats.
from Bio import SeqIO
from Bio.SeqUtils import gc_fraction
import gzip
seq_lengths = []
gc_contents = []
with gzip.open("ecoli_cds.fna.gz", "rt") as handle:
for record in SeqIO.parse(handle, "fasta"):
seq_lengths.append(len(record.seq))
gc_contents.append(gc_fraction(record.seq) * 100)
print(f"Parsed {len(seq_lengths)} sequences.")
Parsed 4315 sequences.
📈 Plot: Length Distribution¶
Histogram of sequence lengths.
import matplotlib.pyplot as plt
plt.hist(seq_lengths, bins=50, color='lightblue')
plt.title("CDS Length Distribution")
plt.xlabel("Length (bp)")
plt.ylabel("Frequency")
plt.show()
🌐 Plot: GC Content vs Length¶
Explore how GC% varies with sequence length.
plt.scatter(seq_lengths, gc_contents, alpha=0.5)
plt.title("GC Content vs CDS Length")
plt.xlabel("Length (bp)")
plt.ylabel("GC Content (%)")
plt.show()
2️⃣ Gene Expression Volcano Plot (Simulated)¶
🧪 Simulate Expression Data for Volcano Plot¶
Create a mock expression dataset to visualize differential expression.
import pandas as pd
import numpy as np
import plotly.express as px
np.random.seed(42)
df = pd.DataFrame({
'log2FC': np.random.normal(0, 2, 500),
'pval': np.random.uniform(0, 1, 500)
})
df['-log10(pval)'] = -np.log10(df['pval'])
df['Significant'] = (abs(df['log2FC']) > 1) & (df['pval'] < 0.05)
🌋 Plot: Simulated Volcano Plot¶
A scatter plot showing significance and fold change.
fig = px.scatter(df, x='log2FC', y='-log10(pval)', color='Significant', title="Simulated Volcano Plot")
fig.show()
3️⃣ Protein Structure: 3D Visualization with py3Dmol¶
📦 Download Hemoglobin Protein Structure (1A3N)¶
We’ll use this for 3D molecular visualization.
import urllib.request
url = "https://files.rcsb.org/download/1A3N.pdb"
urllib.request.urlretrieve(url, "1A3N.pdb")
print("✅ Downloaded 1A3N.pdb")
✅ Downloaded 1A3N.pdb
import warnings
from Bio import BiopythonWarning
from Bio.PDB import PDBParser
# Suppress Biopython warnings
warnings.simplefilter('ignore', BiopythonWarning)
# Load and parse the structure
parser = PDBParser()
structure = parser.get_structure("Hemoglobin", "1A3N.pdb")
# Print chain IDs
print("Chains in structure:")
for model in structure:
for chain in model:
print(" - Chain ID:", chain.id)
Chains in structure: - Chain ID: A - Chain ID: B - Chain ID: C - Chain ID: D
🧬 Interactive 3D Viewer Setup with py3Dmol¶
Let’s visualize the protein in 3D using a cartoon model.
import py3Dmol
view = py3Dmol.view(query='pdb:1A3N')
view.setStyle({'cartoon': {'color': 'spectrum'}})
view.zoomTo()
view.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
🔬 Tip: Annotating Protein Regions (Optional)¶
To highlight active sites, ligands, or domains, you can add selections like:
view.addStyle({'chain': 'A', 'resn': 'HEM'}, {'stick': {}})
You can also add labels and spheres to residues for educational demos or reports.
import py3Dmol
view = py3Dmol.view(query='pdb:1A3N')
view.addStyle({'chain': 'A', 'resn': 'HEM'}, {'stick': {}})
view.zoomTo()
view.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
🧬 Interactive Protein Viewer with Chain and Style Selection¶
This viewer let you:
- Select a chain (A, B, C, or D)
- Choose a visual style (cartoon, stick, or surface)
import py3Dmol
import ipywidgets as widgets
from IPython.display import display
# Define available options
chains = ['A', 'B', 'C', 'D']
styles = ['cartoon', 'stick', 'surface']
# Create widgets
chain_selector = widgets.Dropdown(
options=chains,
value='A',
description='Chain:',
style={'description_width': 'initial'}
)
style_selector = widgets.Dropdown(
options=styles,
value='cartoon',
description='Style:',
style={'description_width': 'initial'}
)
# Function to update viewer
def update_viewer(chain_id, style):
view = py3Dmol.view(query='pdb:1A3N')
view.setStyle({'cartoon': {'color': 'lightgrey'}})
view.addStyle({'chain': chain_id}, {style: {'color': 'red'}})
view.zoomTo()
view.show()
# Display widgets together
ui = widgets.HBox([chain_selector, style_selector])
out = widgets.interactive_output(update_viewer, {'chain_id': chain_selector, 'style': style_selector})
display(ui, out)
HBox(children=(Dropdown(description='Chain:', options=('A', 'B', 'C', 'D'), style=DescriptionStyle(description…
Output()